On Using SVM and Kolmogorov Complexity for Spam Filtering
نویسندگان
چکیده
As a side effect of e-marketing strategy the number of spam e-mails is rocketing, the time and cost needed to deal with spam as well. Spam filtering is one of the most difficult tasks among diverse kinds of text categorization, sad consequence of spammers dynamic efforts to escape filtering. In this paper, we investigate the use of Kolmogorov complexity theory as a backbone for spam filtering, avoiding the burden of text analysis, keywords and blacklists update. Exploiting the fact that we can estimate a message information content through compression techniques, we represent an e-mail as a multidimensional real vector and then we implement a support vector machine classifier to classify new incoming e-mails. The first results we get exhibit interesting accuracy rates and emphasize the relevance of our idea.
منابع مشابه
Instance-Based Spam Filtering Using SVM Nearest Neighbor Classifier
In this paper we evaluate an instance-based spam filter based on the SVM nearest neighbor (SVM-NN) classifier, which combines the ideas of SVM and k-nearest neighbor. To label a message the classifier first finds k nearest labeled messages, and then an SVM model is trained on these k samples and used to label the unknown sample. Here we present preliminary results of the comparison of SVM-NN wi...
متن کاملA comparative study for content-based dynamic spam classification using four machine learning algorithms
The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naı̈ve Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is prese...
متن کاملSurvey of Spam Filtering Techniques and Tools, and MapReduce with SVM
Abstract Spam is unsolicited, junk email with variety of shapes and forms. To filter spam, various techniques are used. Techniques like Naïve Bayesian Classifier, Support Vector Machine (SVM) etc. are often used. Also, a number of tools for spam filtering either paid or free are available. Amongst all techniques SVM is mostly used. SVM is computationally intensive and for training it can’t work...
متن کاملSupport Vector Machines Parameter Selection Based on Combined Taguchi Method and Staelin Method for E-mail Spam Filtering
Support vector machines (SVM) are a powerful tool for building good spam filtering models. However, the performance of the model depends on parameter selection. Parameter selection of SVM will affect classification performance seriously during training process. In this study, we use combined Taguchi method and Staelin method to optimize the SVM-based E-mail Spam Filtering model and promote spam...
متن کاملSVM-Based Spam Filter with Active and Online Learning
A realistic classification model for spam filtering should not only take account of the fact that spam evolves over time, but also that labeling a large number of examples for initial training can be expensive in terms of both time and money. This paper address the problem of separating legitimate emails from unsolicited ones with active and online learning algorithm, using a Support Vector Mac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008